AITopics | ambiguous example

Collaborating Authors

ambiguous example

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

95b6e2ff961580e03c0a662a63a71812-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-14-2026, 06:59:50 GMT

data map, dataset, subgroup, (15 more...)

Neural Information Processing Systems

Country:

South America > Brazil (0.04)
Europe > Netherlands > South Holland > Rotterdam (0.04)
Asia > Japan (0.04)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

95b6e2ff961580e03c0a662a63a71812-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 08:01:10 GMT

characterization, dataset, subgroup, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
South America > Brazil (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data

Neural Information Processing SystemsAug-17-2025, 02:57:29 GMT

High model performance, on average, can hide that models may systematically underperform on subgroups of the data.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Brazil (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

Seo, Wooseok, Han, Seungju, Jung, Jaehun, Newman, Benjamin, Lim, Seungwon, Lee, Seungbeen, Lu, Ximing, Choi, Yejin, Yu, Youngjae

arXiv.org Artificial IntelligenceJun-17-2025

Fact verification is essential for ensuring the reliability of LLM applications. In this study, we evaluate 12 pre-trained LLMs and one specialized fact-verifier, including frontier LLMs and open-weight reasoning LLMs, using a collection of examples from 14 fact-checking benchmarks. We share three findings intended to guide future development of more robust fact verifiers. First, we highlight the importance of addressing annotation errors and ambiguity in datasets, demonstrating that approximately 16\% of ambiguous or incorrectly labeled data substantially influences model rankings. Neglecting this issue may result in misleading conclusions during comparative evaluations, and we suggest using a systematic pipeline utilizing LLM-as-a-judge to help identify these issues at scale. Second, we discover that frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance. We therefore recommend future studies include comparisons with these simple yet highly effective baselines. Lastly, despite their effectiveness, frontier LLMs incur substantial costs, motivating the development of small, fine-tuned fact verifiers. We show that these small models still have room for improvement, particularly on instances that require complex reasoning. Encouragingly, we demonstrate that augmenting training with synthetic multi-hop reasoning data significantly enhances their capabilities in such instances. We release our code, model, and dataset at https://github.com/just1nseo/verifying-the-verifiers

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.13342

Country:

Asia (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Sports > Motorsports (0.68)
Leisure & Entertainment > Sports > Soccer (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

Bakalova, Aleksandra, Veitsman, Yana, Huang, Xinting, Hahn, Michael

arXiv.org Artificial IntelligenceMar-31-2025

In-Context Learning (ICL) is an intriguing ability of large language models (LLMs). Despite a substantial amount of work on its behavioral aspects and how it emerges in miniature setups, it remains unclear which mechanism assembles task information from the individual examples in a fewshot prompt. We use causal interventions to identify information flow in Gemma-2 2B for five naturalistic ICL tasks. We find that the model infers task information using a two-step strategy we call contextualize-then-aggregate: In the lower layers, the model builds up representations of individual fewshot examples, which are contextualized by preceding examples through connections between fewshot input and output tokens across the sequence. In the higher layers, these representations are aggregated to identify the task and prepare prediction of the next output. The importance of the contextualization step differs between tasks, and it may become more important in the presence of ambiguous examples. Overall, by providing rigorous causal analysis, our results shed light on the mechanisms through which ICL happens in language models.

information, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2504.00132

Country:

Europe > Moldova (0.04)
Europe > Germany > Saarland (0.04)
Europe > Montenegro (0.04)
(4 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

APE: Active Learning-based Tooling for Finding Informative Few-shot Examples for LLM-based Entity Matching

Qian, Kun, Sang, Yisi, Bayat, Farima Fatahi, Belyi, Anton, Chu, Xianqi, Govind, Yash, Khorshidi, Samira, Khot, Rahul, Luna, Katherine, Nikfarjam, Azadeh, Qi, Xiaoguang, Wu, Fei, Zhang, Xianhan, Li, Yunyao

arXiv.org Artificial IntelligenceJul-29-2024

Prompt engineering is an iterative procedure often requiring extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to providing LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstrations for LLMs is labor-intensive, frequently entailing sifting through an extensive search space. In this demonstration, we showcase a human-in-the-loop tool called APE (Active Prompt Engineering) designed for refining prompts through active learning. Drawing inspiration from active learning, APE iteratively selects the most ambiguous examples for human feedback, which will be transformed into few-shot examples within the prompt. The demo recording can be found with the submission or be viewed at https://youtu.be/OwQ6MQx53-Y.

active learning, ape, llm, (12 more...)

arXiv.org Artificial Intelligence

2408.04637

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Automated Text Identification Using CNN and Training Dynamics

Creanga, Claudiu, Dinu, Liviu Petrisor

arXiv.org Artificial IntelligenceMay-18-2024

We used Data Maps to model and characterize the AuTexTification dataset. This provides insights about the behaviour of individual samples during training across epochs (training dynamics). We characterized the samples across 3 dimensions: confidence, variability and correctness. This shows the presence of 3 regions: easy-to-learn, ambiguous and hard-to-learn examples. We used a classic CNN architecture and found out that training the model only on a subset of ambiguous examples improves the model's out-of-distribution generalization.

ambiguous example, dataset, prediction, (13 more...)

arXiv.org Artificial Intelligence

2405.11212

Country:

Europe > Spain > Andalusia > Jaén Province > Jaén (0.05)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA

Stengel-Eskin, Elias, Guallar-Blasco, Jimena, Zhou, Yi, Van Durme, Benjamin

arXiv.org Artificial IntelligenceJun-1-2023

Natural language is ambiguous. Resolving ambiguous questions is key to successfully answering them. Focusing on questions about images, we create a dataset of ambiguous examples. We annotate these, grouping answers by the underlying question they address and rephrasing the question for each group to reduce ambiguity. Our analysis reveals a linguistically-aligned ontology of reasons for ambiguity in visual questions. We then develop an English question-generation model which we demonstrate via automatic and human evaluation produces less ambiguous questions. We further show that the question generation objective we use allows the model to integrate answer group information without any direct supervision.

annotator, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2211.07516

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > Dominican Republic (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.56)

Add feedback

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data

Seedat, Nabeel, Crabbé, Jonathan, Bica, Ioana, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceOct-24-2022

High model performance, on average, can hide that models may systematically underperform on subgroups of the data. We consider the tabular setting, which surfaces the unique issue of outcome heterogeneity - this is prevalent in areas such as healthcare, where patients with similar features can have different outcomes, thus making reliable predictions challenging. To tackle this, we propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes. We do this by analyzing the behavior of individual examples during training, based on their predictive confidence and, importantly, the aleatoric (data) uncertainty. Capturing the aleatoric uncertainty permits a principled characterization and then subsequent stratification of data examples into three distinct subgroups (Easy, Ambiguous, Hard). We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets. We show that Data-IQ's characterization of examples is most robust to variation across similarly performant (yet different) models, compared to baselines. Since Data-IQ can be used with any ML model (including neural networks, gradient boosting etc.), this property ensures consistency of data characterization, while allowing flexible model selection. Taking this a step further, we demonstrate that the subgroups enable us to construct new approaches to both feature acquisition and dataset selection. Furthermore, we highlight how the subgroups can inform reliable model usage, noting the significant impact of the Ambiguous subgroup on model generalization.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.13043

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
South America > Brazil (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Information Technology (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers

Pradhan, Vivek Krishna, Schaekermann, Mike, Lease, Matthew

arXiv.org Artificial IntelligenceDec-4-2021

While crowdsourcing now enables labeled data to be obtained more quickly, cheaply, and easily than ever before (Snow et al., 2008; Alonso, 2015; Sorokin and Forsyth, 2008), ensuring data quality remains something of an art, challenge, and perpetual risk. Consider a typical workflow for annotating data on Amazon Mechanical Turk (MTurk): a requester designs an annotation task, asks multiple workers to complete it, and then post-processes labels to induce final consensus labels. Because the annotation work itself is largely opaque, with only submitted labels being observable, the requester typically has little insight into what if any problems workers encounter during annotation. While statistical aggregation (Sheshadri and Lease, 2013; Hung et al., 2013; Zheng et al., 2017) and multi-pass iterative refinement (Little et al., 2010a; Goto et al., 2016) methods can be employed to further improve initial labels, there are limits to what can be achieved by post-hoc refinement following label collection. If initial labels are poor because many workers were confused by incomplete, unclear, or ambiguous task instructions, there is a significant risk of "garbage in equals garbage out" (Vidgen and Derczynski, 2020). In contrast, consider a more traditional annotation workflow involving trusted annotators, such as practiced by the Linguistic Data Consortium (LDC) (Griffitt and Strassel, 2016).

ambiguous example, instruction, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2112.02255

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Hawaii (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Add feedback